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Abstract Text (1) : 

External memory engine selectable pipeline architecture provides external memory to 
a multi-thread packet processor which processes data packets using a multi-threaded 
pipelined machine wherein no instruction depends on a preceding instruction because 
each instruction in the pipeline is executed for a different thread. The route 
switch packet architecture transfers a data packet from a flexible data input 
buffer to a packet task manager, dispatches the data packet from the packet task 
manager to a multi-threaded pipelined analysis machine, classifies the data packet 
in the analysis machine, modifies and forwards the data packet in a packet 
manipulator. The route switch packet architecture includes an analysis machine 
having multiple pipelines, wherein one pipeline is dedicated to directly 
manipulating individual data bits of a bit field, a packet task manager, a packet 
manipulator, a global access bus including a master request bus and a slave request 
bus separated from each other and pipelined, an external memory engine, and a hash 
engine. 

Brief Summary Text (2) : 

This invention generally relates to the field of data communications and data 
processing architectures. More particularly, the present invention relates to a 
novel external memory engine (EME) selectable pipeline architecture for a multi- 
thread packet processor which processes data packets using a multi-threaded 
pipelined machine wherein no instruction depends on a preceding instruction because 
each instruction in the pipeline is executed for a different thread. 

Brief Summary Text ( 6) : 

Further enhancements in processor throughput include modifications to the processor 
hardware to increase the average number of operations executed per clock cycle. 
Such modifications, may include, for example instruction pipelining, the use of 
cache memories, and multi-thread processing. Pipeline instruction execution allows 
subsequent instructions to begin executing before previously issued instructions 
have finished. Cache memories store frequently used and other data nearer the 
processor and allow instruction execution to continue, in most cases, without 
waiting the full access time of a main memory. Multi-thread processing divides a 
processing task into independently executable sequences of instructions called 
threads and the processor, recognizing when an instruction has caused it to be idle 
(i.e., first thread), switches from the instruction causing the memory latency to 
another instruction (i.e., second thread) independent from the former instruction. 
At some point, the threads that had caused the processor to be idle will be ready 
and the processor will return to those threads. By switching from one thread to the 
next, the processor can minimize the amount of time that it is idle. 

Brief Summary Text (10): 

Methods and apparatuses consistent with the principles of the present invention, as 
embodied and broadly described herein, provide an EME selectable pipeline 
architecture to a multi-thread packet processor that processes data packets using a 
multi-threaded pipelined machine wherein no instruction depends on a preceding 
instruction because each instruction in the pipeline is executed for a different 
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thread. The multi-thread packet processor transfers a data packet from a flexible 
data input buffer to a packet task manager, dispatches the data packet from the 
packet task manager to a multi-threaded pipelined analysis machine, classifies the 
data packet in the analysis machine, modifies and forwards the data packet in a 
packet manipulator. The multi-thread packet processor includes an analysis machine 
having multiple pipelines, wherein one pipeline is dedicated to directly 
manipulating individual data bits of a bit field, a packet task manager, a packet 
manipulator, a global access bus including a master request bus and a slave request 
bus separated from each other and pipelined, an external memory engine, and a hash 
engine . 

Detailed Description Text (4) : 

As shown in FIG. 1, an embodiment of the route switch packet architecture according 
to one aspect of the invention comprises Bi-directional Access Port (BAP) 10, Host 
Packet Injection (HPI) 14, Flexible Data Input Buffer (FDIB) 20, Test 28, Clock & 
PLLS 30, Analysis Machines (AMs) 42, 56, 70, 84, Packet Task Manager (PTM) 98, 
Global Access Buses (GAB) 108, 110, 112, 114, 116, 118, External Memory Engines 
(EME) 120, 156, Internal Memory Engines (IME) 122, 152, Packet Manipulator (PM) 
126, Hash Engine (HE) 158, Centralized Look-Up Engine Interface (CIF) 160, Flexible 
Data Output Buffer (FDOB) 162, and Search/Results/Private 166, 168. With the 
exception of Search/Results/Private 166, 168, the combination of the above 
described elements may be considered a multi-thread packet processor. 

Detailed Description Text (5) : 

BAP 10 is operationally connected to each of the above described elements of the 
multi-thread packet processor. BAP 10 supports accesses to and from a generic host 
and peripheral devices. The multi-thread packet processor may be configured as the 
arbiter of the BAP bus. Each element is capable of interfacing via one or more GABs 
108, 110, 112, 114, 116, 118. Each AM 42, 56, 70, 84 may be configured with 32 
independent threads used for packet processing. The packet processing effected by 
AMs 42, 56, 70, 84 involves determining what packets are and what to do with them. 
AMs 42, 56, 70, 84 do not modify packets. All modifications of a packet are 
effected in PM 126, which may be configured as a programmable streaming packet 
modification engine. PM 126 has the ability, when directed, to forward a packet, 
drop a packet, or execute a set of instructions for modifying and forwarding a 
packet. Control is passed to PM 126 from PTM 98. PTM 126 is configured as the 
multi-thread packet processor mechanism for getting packets from FDIB 20, 
dispatching them to AMs 42, 56, 70, 84, and finally dispatching them to PM 126. 
EMEs 120, 156 are resources shared by AMs 42, 56, 70, 84 and PM 126. IMEs 122, 152 
are resources shared by AMs 42, 56, 70, 84 and PM 126 that each contain an internal 
memory that is capable of reads, writes, read/clear, atomic addition, and atomic 
statistics addition operations through a GAB connection. HE 158 is configured as a 
resource shared by AMs 42, 56, 70, 84 that hashes up to a 64-bit value down to 24 
bits or less after a predetermined number of clock cycles. CIF 160 is configured as 
a resource shared by AMs 42, 56, 70, 84 that provides an interface to an external 
CLUE for centralized lookups. FDOB 162 is configured as a semi-configurable packet 
output interface whose main function is to interface PM 126 to an external system. 

Detailed Description Text (6) : 

The multi-thread packet processor is configured as a complex packet processor and 
incorporates a program downloaded to its instruction memories. The processor also 
incorporates global register configurations set for an application. Simple data 
structures in private, results, and statistics memory as well as complex search 
memory data structures are generally initialized. The results and search memory 
structures may be routinely updated by the control processor with new routing 
information, as it becomes available. 

Detailed Description Text (7) : 

The multi-thread packet processor is configured as a multi-layer packet processor. 
In other words, the multi-thread packet processor is configured for providing 
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packet transfer capabilities in network communication Layers 1 to 4 . 
Detailed Description Text (16) : 

As a multi-layer packet processor, one function of the multi-thread packet 
processor is to lookup, process, and forward packets. The forwarding performance of 
the multi-thread packet processor is directly related to the maximum rate at which 
the minimum size packet can be presented, processed and forwarded. The minimum size 
Internet protocol (IP) packet is strictly an IP header of 20-bytes, although this 
packet is highly unlikely since 60-70% of backbone traffic is normally TCP. The 
typical minimum size packet is a TCP ACK packet, which contains a 20-byte IP header 
and a 20-byte TCP header equaling 40-bytes. The multi-thread packet processor is 
capable of handling both cases. The multi-thread packet processor is designed for 
up to an internal 250 MHz operation, with external memory and I/O speeds of up to 
200 MHz. This provides roughly 16.5 millions of packets per second (MPPS) with 60 
instructions per packet forwarding decision, adequately forwarding OC-192c line 
rate IP traffic for packets greater than or equal to 64-bytes. 

Detailed Description Text (17): 

In a packet processor, there is no explicit relationship from one packet to another 
packet except for the sequence of packets. The packets may be dispatched to 
multiple processing units or to multiple threads on a pipelined processing engine, 
as long as the packet sequence is maintained. Because of this, the multi-thread 
packet processor may be partitioned into multiple packet processing units, each 
being multi-threaded to keep all execution pipelines fully operating. Since this is 
a hardware partitioning, the packet sequencing is kept in hardware via PTM 98. As 
previously mentioned, the multi-thread packet processor may be designed for up to 
250 MHz with 4 packet processing units providing 16.5 MPPS with 60 instructions 
used per packet forwarding decision. 

Detailed Description Text (18): 

Because the multi-thread packet processor processes the packets, it includes search 
capabilities. A common search metric used is the number of lookups per second the 
processor is capable of performing. The metric is typically bound, so that relative 
performance can be measured. Lookups using the radix-4 method can be effectively 
used in the routing of IP packets. The number of 24-bit radix-4 lookups for the 
multi-thread packet processor is a direct relation of the number of memory accesses 
EMEs 120, 166 are able to do per second. (The lookup functionality is part of the 
External Memory Engine submodule.) 

Detailed Description Text (21) : 

BAP 10 may be designed for access by a general-purpose processor. All memory and 
register locations in the multi-thread processor address space are accessible from 
BAP 10. In an effort to make BAP 10 adaptable to future requirements, BAP 10 may be 
available to AMs 42, 56, 70, 84 with the intention of reading status information 
from external peripheral devices. One application is the reading of external queue 
depths for use in implementing intelligent drop mechanisms. It is assumed that 
these algorithms only need to access the peripheral bus periodically. Thus, the 
interface can be shared with arbitrated host accesses. If host accesses are limited 
once a system is in a steady state, the multi-thread packet processor is capable of 
supporting accesses up to once per packet. At 16 million packets per second (MPPS), 
this equates to 16 million peripheral accesses per second. Thus, the multi-thread 
packet processor 250 MHz operation allows up to 15 cycles per access. 

Detailed Description Text (22) : 

BAP 10 is configured as a shared multiplexed address and data bus that supports 
accesses to and from a generic host and peripheral devices. BAP 10 contains Global 
Registers 12, which include configuration and status registers that are global to 
the multi-thread packet processor. Registers that are specific to an element's 
function are contained in that element and accessible via one of the element's GAB 
interfaces. The operation of BAP 10 is controlled by BAP Global Registers 12. These 
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registers include the source address, destination address, status register, 
interrupt vector, transfer size register, and several others. BAP's 10 interface to 
a host uses a chip select and ready control handshaking mechanism, allowing BAP 10 
to interface with an external host operating at an unrelated asynchronous 
frequency. BAP 10 interfaces to all of the multi-thread packet processor's elements 
on each of the internal GABs 108, 110, 112, 114, 116, 118. BAP 10 provides direct 
accesses to all internal memory and register locations for normal read and write 
operation types. 

Detailed Description Text (23) : 

The multi-thread packet processor functions as the arbiter of the BAP bus. 
Generally, a host requests and is granted access to BAP 10. A configuration 
register is used to assign priority either to the generic host to access the multi- 
thread packet processor or for AMs 42, 56, 70, 84 to access peripheral devices. A 
default priority is given to the generic host at reset which facilitates the 
downloading of initial configuration data. After the configuration process is 
complete, the host sets the configuration register to give priority to AMs 42, 56, 
70, 84. The host is still guaranteed a minimum access rate. The multi-thread packet 
processor may initiate access to peripherals and, as BAP 10 arbiter, the multi- 
thread packet processor does not need to request and be granted BAP 10 to access 
peripherals. The request/grant is only for the generic host. BAP 10 also provides 
the reset logic and buffering for the multi-thread packet processor. 

Detailed Description Text (25) : 

HPI 14 is configured to be used by an external host to inject a packet into the 
multi-thread packet processor stream. HPI 14 includes Control Memory 16 and Packet 
Memory 18, and functions in the same manner as the FDIB on the Packet Input and 
Packet Data GABs. Both operate as special FIFOs (first in first outs) accessed by 
PTM 98, AMs 42, 56, 70, 84 and PM 126. HPI 14 has priority over FDIB 20 for packet 
insertion that is handled by PTM 98. HPI 14 is configured as a slave device to BAP 
10. Because HPI 14 may not support burst mode reads, BAP 10 writes one 64-bit data 
word at a time to HPI 14 . 

Detailed Description Text (27): 

FDIB 20 is configured as a packet input interface. Generally, packet data and 
control information are pushed down to FDIB 20. FDIB 20 is configured as a single 
port with the capability of supporting 32 or 64-bit width operations. FDIB 20 
performs packet master sequence generation and tagging for the inbound interface 
coordinating with up to three other multi-thread packet processors. 

Detailed Description Text (28) : 

FDIB 20 also contains the main packet buffering for the multi-thread packet 
processor. FDIB 20 includes four Packet Memories 26. Each of these memories may be 
configured as a 512 . times . 128-bit dual port memory device that is segmented into 
512 64-byte buffers. Each buffer has a page descriptor word contained in a separate 
512. times. 27 dual port memory. As pages fill, the descriptors are parsed and packet 
descriptors are generated with information including error-type (e.g., 3-bits) , the 
length of the packet (e.g., 13-bits) as calculated by FDIB 20, and the master 
sequence number (e.g., 12-bits). Additionally stored are the receive port (e.g., 4- 
bits) and the address of the first page of the packet. All FDIB Packet Memories 26 
and configuration registers are accessible by the host as well, with Packet 
Memories 26 being restricted to diagnostic mode access. 

Detailed Description Text (34): 

Test 28 houses test visibility multiplexing structures for routing the state 
machines and critical signals of the multi-thread packet processor as well as the 
AMs and PM instruction memories to the external test pins. This function is 
intended for use in debugging multi-thread packet processor operational faults. 

Detailed Description Text (36) : 
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Clocks & PLLs 30 provide a repository for all functions of the multi-thread packet 
processor dealing with clock buffering, synchronization, generation, and testing. 
This element contains phased lock loops, logic, and buffering necessary to create 
primary buffered clock domains of the multi-thread packet processor. Tight skew 
control of the clock inputs to interfacing devices is maintained in order to ensure 
proper multi-thread packet processor operation. Additionally, the multi-thread 
packet processor has 4 memory return clocks {1 per memory bank) that clock the 
flip-flops attached to the primary inputs on the data bus of EMEs 120, 166. 

Detailed Description Text (41) : 

AMs 42, 56, 70, 84 have no direct connection to external interfaces of the multi- 
thread packet processor. They interface to internal elements that may or may not 
have external connections. 

Detailed Description Text (44): 

Each AM includes packet pre-classif ication hardware. PTM 98 passes the length and 
address of the first buffer page of a packet to an AM thread. The next available 
thread takes the address and begins a fetch of the page into the Packet Header 
Memory contained in the AM. While the transfer is occurring over the AM 1 s Packet 
Input GAB I/F, the pre-classif ication hardware snoops the data to classify the most 
basic known types. The hardware classification may be programmable and may be 
enabled or disabled. The concept of the hardware pre-classif ication is to aid the 
AM in a "fast dispatch" saving instructions for more critical processing. As such, 
pre-classif ication may be limited to well known protocols that make up 90-95% of 
the packet traffic. The pre-classif ication also aids in attempting to maintain line 
rate for packets smaller than 64-bytes. By pre-classif ying some of the small packet 
types, less instructions can be used for these types, which in turn yields more 
processing power in the multi-thread packet processor and then the subsequent 
support of line rate for these as well. 

Detailed Description Text (48) : 

The pipeline internal to the EME co-processor is 8 cycles counting the external 
memory pipeline. There is an additional 2 cycles for synchronization into the EME 
memory clock domain. The EME may operate to run on a 200 MHz clock domain so that a 
clock conversion factor of 250 MHz/20OMHz may be applied. This puts the pipeline 
depth at 12.5 cycles. Additionally, two cycles for synchronization back into the 
multi-thread packet processor clock domain and four cycles for traversing the GAB 
master and slave interfaces may be included. This provides a 16.5 cycle total for 
an EME pipeline bank. 

Detailed Description Text (55) : 

To combat this, the multi-thread packet processor allows direct manipulation of bit 
fields. The problem of bit field isolation, manipulation, and reintegration into 
the larger data item is handled by the underlying hardware rather than a sequence 
of instructions as would be done on a general purpose processor. The additional 
hardware increases the processing pipeline depth of each AM, but does not have a 
detrimental effect on the multi-thread packet processor throughput. For example, 
consider the problem of incrementing a 5-bit field within a word. The general- 
purpose processor generally needs to extract the field into a register, increment 
that register, and insert the field back. For AMs 42, 56, 70, 84, this function is 
effected using a single instruction: addDl [field], 1, DO [field] 

Detailed Description Text (58): 

These eight combinations, however, can be significantly reduced with some 
assumptions and restrictions. The SFS and FSS are essentially the same with one of 
the sources having a bit field, the other source at 64-bit and the destination at 
64-bit. By restricting the assembler to require that the FSS combination be used, 
SFS may be eliminated. A similar restriction forces SFF and FSF to only need FSF. 
The SSS can essentially be mapped to an FSS structure where the bit field of the 
first source is the full 64-bits. Looking at this a different way we are stating 
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that the first source argument is always treated as a bit field. For the multi- 
thread packet processor a 2-bit field in the instruction selects the second source 
as simple or bit field as well as selecting the destination as simple or bit field. 
AMs 42, 56, 70, 84 impose another restriction, that if both the second source and 
the destination are bit fields then they occupy the same bit lanes. The eight 
combinations become: 

Detailed Description Text (59) : 

This allows only four styles FSS, FSF, FFS, and FFF to be implemented in hardware 
and provides 7 out of 8 combinations. The effect of making bit fields and memory 
both first class objects has many benefits. Because memory can be manipulated just 
as readily as data registers, issues regarding loads, misalignments, or register 
optimizations are generally not factors. Furthermore, since the multi-thread packet 
processor provides bit field manipulation, the data can generally be manipulated in 
place rather than having to first isolate it in a general register. This has a 
significant effect on the number of instructions that may be executed to process a 
packet and thus an effect on the overall packet forwarding performance. Secondly, 
it is easier to write the code that processes packet data. This is important for 
packet processing applications, since most are written in assembly code. Thirdly, 
time to market is accelerated since the amount of code needed to manipulate unique 
data sizes is reduced. 

Detailed Description Text (62): 

Contained in the instruction word of each instruction is a next PC field. The field 
is used as the next PC to execute at for this thread if the condition specified by 
the SETBRCC field of the instruction is met by the result of the operation. If the 
branch is taken, PC+1 is implicitly loaded into the implicit link register and the 
next PC field into the PC register. If the branch is not taken, the normal PC+1 
increment is loaded in the PC. The ability to branch on every instruction is an 
extremely powerful feature that reduces the code set for packet processing 
considerably. The multi-thread packet processor instruction set can be broken down 
into the following classes of instructions: Computational instructions Two-argument 
instructions Three-argument instructions Atomic instructions Flow control 
instructions Load or store instructions Search engine instructions 

Detailed Description Text (68): 

The multi-thread packet processor instruction set may include a load-shift with 
carry instruction. This instruction performs a conditional shift operation on an 
index register based on the condition of a carry flag, the condition of the carry 
flag having been set by a previous arithmetic operation. The instruction also 
performs an indexed load operation using an index register. A binary search using 
the load shift with carry instruction can be performed on a table in which the keys 
are ordered for in order traversal of the table. Each instruction loop for 
traversal of the table normally requires two instructions: one instruction to 
perform a key comparison and conditionally set the carry flag or exit the loop if 
the key has been found; a second instruction that uses the shift left with carry 
instruction to load the next 'load' in the table, and conditionally exit the loop 
if the key is not found. This instruction can minimize the number of instructions 
required for a binary search and may be used for other types of searches. 

Detailed Description Text (70) : 

PTM 98 is the multi-thread packet processor mechanism for getting packets from FDIB 
20, dispatching them to AMs 42, 56, 70, 84, and ultimately dispatching them to PM 
126. PTM 98 is used for packet sequencing and for maintaining the flow of packets 
through the multi-thread packet processor. PTM 98 effectively carries out three 
basic functions: 1. Reading a 33-bit basic descriptor from FDIB 20 and storing it. 
2. Passing some of this information to an AM to get a lookup started. 3. Merging 
original information obtained from FDIB 20 with the analysis results from the AM 
and sending this entire "job packet" to PM 126. This is done with respect to 
sequencing such that there are no previous packets done and ready to be sent. 
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Detailed Description Text (75) : 

As shown in FIG . 3, the GAB is configured as a fully synchronous split operation 
protocol that is separated into two sections: Master Request Bus (MRB) 306, 310 and 
Slave Result Bus (SRB) 308, 312. Each operation starts with a master request and an 
MRB arbiter 302 grant. The MRB registers the operation to the slave devices. The 
operation is completed by a slave request and SRB arbiter 304 grant. The SRB 
registers the data back to the masters. The MRB and SRB are separated from each 
other and are pipelined. This allows multiple master requests to fill the pipelines 
of the slave devices, which are typically co-processing units, and then wait for 
the return data. Since the multi-thread packet processor master devices are 
typically multi-threaded, multiple pipelined requests may occur from any given 
master. Each slave and master has a ready signal to indicate that it is ready for 
the next operation. Masters assert their ready to the SRB arbiter and slaves assert 
their ready to the MRB arbiter. It is up to the designer of the master or slave 
device to insure that the ready signal is only asserted when the device is ready 
for the operations of which it is capable. For example, if a GAB device typically 
takes burst writes, then the ready signal should be asserted when there is enough 
room for a burst. Since the arbiter knows which device a master wants to target and 
has the slaves ready, an additional level of arbitration can implicitly be built in 
by not granting a master the GAB if the targeted slave is not ready. Similarly, the 
SRB can implicitly hold off a slave for return data if the master to return data to 
is not ready. This should not occur since the master had originally requested the 
operation . 

Detailed Description Text (102) : 

This section details information for each of the eight Global Access Buses that 
make up the route switch packet architecture. All deviations to the standard 
operation types and qualifiers are noted. The deviations are restricted to 
different use of the operation qualifier fields and additional types. All data 
movement is big endian aligned using the upper most bits, except for the 36-bit 
accesses, which should pad the upper 28-bits to zero. Connectivity between elements 
of the multi-thread packet processor is accomplished through the use of GABs 104, 
106, 108, 110, 112, 114, 116, 118. GABs 104, 106, 108, 110, 112, 114, 116, 118 
include Packet Input GAB 106, Control GAB 108, Lookup GAB 110, Private Data GAB 
112, Statistics GAB 114, Results GAB 116, and Extension GAB 118. 

Detailed Description Text (103) : 

Packet Input GAB 106 provides an interface between AMs 42, 56, 70, 84 and FDIB 20. 
An AM pulls the first buffer of the packet from FDIB 20 into the Packet Header 
Memory of the AM. During the initial transfer, as noted above, the AM Hardware Pre- 
Classifier snoops the packet and provides information to the AM thread. Subsequent 
accesses deeper into the packet are under full thread control through a 
predetermined instruction. Packet Input GAB 106 is one of the GABs in the multi- 
thread packet processor used for the flow of packet data. Packet Input GAB 106 
transfers packet data from FDIB 20 to one of AMs 42, 56, 70, 84. Typically, the 
transfer is the first page of a packet, but AMs 42, 56, 70, 84 may access any 
number of words to the maximum burst in order to look deeper into a particular 
packet, if the protocol dictates. Packet Input GAB 106 has as its bus masters: all 
four AMs 42,56,70, 84 and its slaves: FDIB 20 and HPI 14 submodules . The Packet 
Input GAB MRB uses TDMr arbitration. This allows fair access among AMs 42, 56, 70, 
84 while not starving BAP 10. Each AM is allocated one out of every four cycles. 
BAP 10 is given 4 out of 256 possible time slices of the TDM and is the default 
member of the round robin i.e., BAP 10 wins round robin only if no AM is 
requesting . 

Detailed Description Text (105) : 

Control GAB 108 provides an interface between an AM and PTM 98. PTM 98 transfers 
packet length, input port, and the address of the first packet buffer in FDIB 20 of 
the packet. The AM is configured as both a master and a slave on Control GAB 108. 
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The registers/memories of the AM are accessible via Control GAB 108 by BAP 10. The 
multi-thread packet processor uses Control GAB 108 or the flow of control 
information between various masters of the multi-thread packet processor. It is 
used primarily for packet notification, sequencing, and internal descriptor 
(message) passing. Control GAB 108 is also used for programming the instruction 
memories and configuration information into the AMs 42, 56, 70, 84, PM 126 and PTM 
98. Control GAB ' s 108 bus masters are: all AMs 42, 56, 70, 84, PTM 98 and BAP 10 
submodules. Control GAB ' s 108 slaves are: all AMs 42, 56, 70, 84, PTM 98, PM 126, 
FDIB 20 and HPI submodules. 

Detailed Description Text (108): 

Lookup GAB 110 provides an interface to an EME 120, 166 for lookups, filters, and 
memory accesses into the external memory. Lookup GAB 110 is primarily used for 
connection of AMs 42, 56, 70, 84 to an EME 120, 166. EMEs 120, 166 are capable of 
reads, writes, atomic/statistic arithmetic, search, and filter operations into its 
external memory. Since the number of accesses to the external memory can approach 
the maximum transfer capabilities of Lookup GAB 110, an EME is the only slave 
member. There are no slave sub-devices and the maximum number of master sub-devices 
is 16 and mapped to each of the AM threads. Flexibility as to what is contained in 
the memory is left to the users of the multi-thread packet processor so all 
operations are supported, but normally lookup search/filter tables and data 
structures for an AM are maintained. Further flexibility is allowed by having a 
connection from PM 126 to allow access to EME memories as well, although PM 126 
access is direct and not over the GAB. A master connection over the GAB to BAP 10 
is also provided to allow search table programming and updates. The Lookup GAB MRB 
uses TDMr arbitration. This allows fair access among AMs 42, 56, 70, 84 while not 
starving BAP 10. BAP 10 may be given four out of 256 possible time slices of the 
TDM and is the default member of the round robin i.e., BAP 10 wins round robin only 
if no AM is requesting. The Lookup GAB SRB uses lowest priority arbitration since 
there is only the one slave member. The Lookup GAB data bus is 64-bits wide for 
lookup/filter keys and memory data. The MRB address bus to the EME is 21 bits to 
select the bank, region and the 32-64 bit word address in the 256k . times . 36 SRAM. 
The SRB section of the bus also has a 64-bit data path. Additionally a 21-bit 
address bus is provided back from the EME for next lookup operations. This is for 
use in the CLUE, but can also be used for segmenting AM lookups. 

Detailed Description Text (109) : 

Private Data GAB 112 is the other GAB in the multi-thread packet processor used for 
the flow of packet data. Private Data GAB 112 transfers packet data from FDIB 20 to 
PM 126. Typically, the transfer is a burst of eight 64-bit words or page of packet 
data. For smaller packets and the last page of packets, PM 126 may request the 
number of words necessary to get to the end of the packet. Private Data GAB 112 may 
have as its bus masters: PM 126 and BAP 10. Private Data GAB 112 may have as its 
slaves: FDIB 20 and HPI 14. 

Detailed Description Text (112) : 

Statistics GAB 114 provides an interface from an AM to the statistics memory 124, 
154 within an IME 122, 152. The associated AM uses this interface to update the 
statistics for packets as they are processed. Statistics Data GAB 114 connects AMs 
42, 56, 70, 84 and PM 126 to an IME. The IME is capable of reads, writes, and 
atomic/statistic arithmetic operations into its memory. Since the number of 
accesses to the internal memory can approach the maximum transfer capabilities of 
Statistics Data GAB 114, the IME is the only slave member. There are no slave sub- 
devices and the maximum number of master sub-devices is 32 and mapped to each of 
the AM threads. Flexibility as to what is contained in the memory is left to the 
users of the multi-thread packet processor so all operations are supported, but 
normally local critical packet statistics are maintained. A master connection over 
the GAB to the BAP 10 is also provided to allow programming, updates, and statistic 
harvesting. The Statistics GAB MRB uses TDMr arbitration. This allows fair access 
among AMs 42, 56, 70, 84 and PM 126 while not starving BAP 10. PM 126 is granted 
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every other cycle, with AMs 42, 56, 70, 84 sharing the other cycle one out of four 
except for the BAP cycles. BAP 10 may be given four out of 256 possible time slices 
of the TDM and is the default member of the round robin i.e., BAP 10 wins round 
robin only if no AM or the PM is requesting. The Statistics GAB SRB uses lowest 
priority arbitration since there is only one slave device. Both the MRB data bus 
and SRB data bus of Statistics GAB 114 are 64-bits wide. The MRB address bus to the 
IME is 11 bits to select the 32-64 bit word address in the lk. times. 64 SRAM. No 
return SRB address path is necessary. The majority of the operation types supported 
are the standard read and write capabilities of any GAB. Additionally 
atomic/statistic arithmetic is supported. The operation qualifiers were also re- 
mapped on the MRB, as byte and 16-bit word accesses are not necessary, but 36-bit 
accesses and read/clear are. The SRB operation qualifiers are also remapped to 
indicate the type of operation that occurred 36-bit, 32-bit, or 64-bit and to 
provide condition codes back to the AM indicating the operation status. Condition 
codes are provided for zero, carry/stuck, and negative/link bit (sign bit set) . 

Detailed Description Text (123): 

FIG. 4 shows a block diagram that depicts one implementaion of the architecture of 
the EME. There are two asynchronous boundaries. The first is in the GAB controller, 
which synchronizes between the internal multi-thread packet processor clock 
(RSP2CLK) and the local clock (MEMCLK) to run the EME core. The second is in the 
high-speed access port (HSAP) controller for PM 126. 

Detailed Description Text (124): 

A separate clock input is used for the EME so that SSRAMs of various speeds can be 
used independent of the multi-thread packet processor's clock frequency. The 
asynchronous boundaries are bridged with asynchronous FIFOs that are deep enough to 
prevent latencies from reducing bandwidth. Pre-processing is applied to lookups and 
filters by most significant bit (MSB) aligning the key and calculating the first 
lookup address for selected searches. If the lookup/filter must continue in another 
EME, the key is least significant bit (LSB) aligned (post-processing after the 
pipeline) so the next EME receives another search. For the other search, the 
address remains the same and the key is MSB aligned. Burst reads and writes are 
preprocessed by generating incremented addresses so the pipeline receives a burst 
of single-address reads or writes. If there is a burst read, all the read data is 
accepted from a single bank before switching to the other bank to keep the burst 
read data contiguous. 

Detailed Description Text (128): 

The ALU performs all the arithmetic functions for atomic and statistical adds, 
including the "stickiness" feature, as well as address calculation for lookups and 
filters. When there is a hit in the write buffer while an atomic or statistical add 
is issued from the Out FIFO, a memory cycle is wasted as the operation travels from 
the outbound pipeline to the inbound pipeline, dropping the read data from external 
memory to use the data in the buffer instead. A large MUX before the ALU controls 
data flow, selecting the most recent data during back-to-back atomic operations 
using the same address. Output and input delay cells are added to improve 
setup/hold times in the read/write paths to external memory . There is a 2-to-l MUX 
to select data for memory writes, using a memory control signal from a register 
bit. This signal is low when late-write SSRAMs are used, so the data is driven one 
clock cycle after the address. If a different memory is used where data must be 
driven two cycles after (i.e., burst mode SSRAMs), a register bit can be set to 
flip the MUX to select data from the next stage in the pipeline. Similarly, a MUX 
using a memory expansion signal selects which address and associated tag 
information corresponds to the incoming SSRAM data. When expanded memory is used, 
the address is delayed a couple clock cycles to match the extra external delay 
where one additional clock cycle is allowed for external address decode and data 
MUXing, and a second extra cycle to register the read data externally before it is 
supplied to the multi-thread packet processor . 
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Detailed Description Text (154): 

CIF 160 is configured as an AM shared resource that provides an interface to a 
Centralized Look Up Engine (CLUE) for centralized lookups. CIF 160 is capable of 
supporting 50 million 24-bit radix4 lookups into a 32 Mbyte memory interfaced to 
the CLUE that may be shared with up to three other multi-thread packet processors. 

Detailed Description Text (161) : 

FDOB 162 is arranged as a semi-configurable packet output interface. FDOB 162 is 
single ported with the ability to support 32 or 64 bit width operation. A single 
parity bit covering the output data is provided. The parity is host-selectable to 
even or odd parity. The interface is further extended by the multi-thread packet 
processor, through out-of-band outputs allowing multi-port operation, with a 
maximum of 16 ports. 

Detailed Description Text (162) : 

FDOB 162 performs the packet master sequence control for the outbound interface 
coordinating with up to 3 other multi-thread packet processors. FDOB 1 s 162 main 
function is to interface PM 126 to an external system. An output FIFO is provided 
to PM 126 with a memory configuration including an SSRAM. Each location has a 22- 
bit status word contained in the memory structure that indicates the start-of- 
packet, end-of-packet , end of multi-cast packet, continuation-of-packet , packet- 
error, packet drop, valid byte count, port identification, and master sequence 
number . 

Detailed Description Text (163) : 

FDOB 162 may be configured to drop a packet that contains an error or to transmit 
the packet and set the control bits to reflect packet-error. The 12-bit master 
sequence number is used for sequencing packets between multiple multi-thread packet 
processors. A transfer out of an individual multi-thread packet processor in a 
master sequence mode occurs when the current master sequence number matches the 
master sequence number of a packet that wants to be transferred. The master 
sequence may be enabled or disabled through the use of an FDOB 162 Configuration 
Register. 

CLAIMS : 

1. A method for providing external memory services to a multi-thread packet 
processor comprising: decoding information data sequences for a data packet within 
the multi-thread packet processor to determine assigning an operation to one of a 
first external memory bank and a second external memory bank; executing a read 
operation by comparing a read address with addresses stored in a write buffer, 
initiating a read, and replacing read data with data in the write buffer; executing 
a write operation by comparing a write address with addresses stored in a write 
buffer when the write buffer is enabled, and writing data to a reserved location in 
the write buffer; executing an atomic add operation by performing a write operation 
in the write buffer if there is available space in the write buffer; executing a 
lookup operation by indexing into a memory bank based on a base address, a key, and 
a key length, and conducting flow and memory accesses of the memory bank; and 
returning a result data field to the multi-thread packet processor when one 
operation is completed. 

3. An apparatus for providing external memory services to a multi-thread packet 
processor, said apparatus comprising; a packet manipulator to decode information 
data sequences for a data packet within the multi-thread packet processor to 
determine assigning an operation to one of a first external memory bank and a 
second external memory bank; a write buffer to execute a read operation by 
comparing a read address with addresses stored in the write buffer, initiating a 
read, and replacing read data with data in the write buffer; and to execute a write 
operation by comparing a write address with addresses stored in the write buffer 
when the write buffer is enabled, and writing data to a reserved location in the 
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write buffer; an arithmetic and logic unit operationally connected to said write 
buffer to perform an atomic add operation by performing a write operation in the 
write buffer if there is available space in the write buffer; and a loopback first- 
in-first-out unit operationally connected to said write buffer to return a result 
data field to the multi-thread packet processor when one operation is completed. 

4 . An apparatus for providing external memory services to a multi-thread packet 
processor according to claim 3, further comprising: an analysis machine having 
multiple pipelines, wherein one pipeline is dedicated to directly manipulating 
individual data bits of a bit field; and a packet task manager operationally 
connected to said analysis machine. 
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PRIMARY -EXAMINER: Ray; Gopal C. 

ATT Y-AGENT- FIRM : Steubing McGuinness & Manaras LLP 
ABSTRACT: 

External memory engine selectable pipeline architecture provides external memory to 
a multi-thread packet processor which processes data packets using a multi-threaded 
pipelined machine wherein no instruction depends on a preceding instruction because 
each instruction in the pipeline is executed for a different thread. The route 
switch packet architecture transfers a data packet from a flexible data input 
buffer to a packet task manager, dispatches the data packet from the packet task 
manager to a multi-threaded pipelined analysis machine, classifies the data packet 
in the analysis machine, modifies and forwards the data packet in a packet 
manipulator. The route switch packet architecture includes an analysis machine 
having multiple pipelines, wherein one pipeline is dedicated to directly 
manipulating individual data bits of a bit field, a packet task manager, a packet 
manipulator, a global access bus including a master request bus and a slave request 
bus separated from each other and pipelined, an external memory engine, and a hash 
engine . 

16 Claims, 7 Drawing figures 
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Brief Summary Text (5) : 

VLIWs are highly effective for regular, loop-oriented tasks such as are typical of 
the performance-sensitive aspects of digital signal processing and other "number- 
crunching" applications. Many modern applications require that one processor serve 
a mixture of programming paradigms. For example, a real-time embedded DSP 
application mixes both DSP and control processing tasks. The latter tasks typically 
have little latent ILP. Multi-thread execution would better suit the application 
when it is not solely involved in time-critical DSP kernel inner loop execution. 

Detailed Description Text ( 4 ) : 

The instruction packet addressed by Fetch program counter (PC) and control unit 110 
in Instruction Dispatch /Decode unit 115 is fetched from program memory 105 into 
Instruction Dispatch /Decode unit 115. The Instruction Dispatch /Decode unit 115 
decodes these component instructions, schedules them and lastly dispatches them to 
the specific function unit (S, L, M or D) and side as indicated by the instruction. 
Up to 8 instructions may be dispatched per cycle, four per side and one per 
functional unit. The sides 130A and 130B have respective branch units 135A and 135B 
that process program branches. If a branch is taken, the corresponding branch unit 
135A or 135B directs the Fetch program counter and control unit 110 to start 
fetching from the new address. Interrupt processing occurs in interrupt masking and 
steering unit 125. If an interrupt is taken, then interrupt masking and steering 
unit 125 directs Fetch program counter and control unit 110 to start fetching from 
the address of a corresponding interrupt handler. If multiple branches and/or an 
interrupt are taken in one cycle, then a prioritization scheme is used to select 
one and the other events are either ignored or deferred. Pipeline state unit 120 
consists of typical state information kept in a pipelined processor such as a 
program counter and status information for each stage of the pipe. 

Detailed Description Text (9) : 

Processor 200 illustrated in FIG. 2 operates as follows. Fetch program counter and 
control units 110A and HOB each fetch a half packet of instructions, (1/2 of a 
VLIW eight instruction packet ) from program memory 105 (which is now dual ported) , 
into respective Instruction Dispatch /Decode units 115A and 115B. Instruction 
Adapter units 217A and 217B bind the instructions to the A and B sides 
respectively. Then each side's dispatch and decode logic 115A and 115B prepares the 
half packets for execution as per a "normal" C6x. After which the instructions are 
dispatched to their respective sides (instructions on A side to the A functional 
units SI, LI, Ml and Dl, and B instructions to the B units S2, L2, M2 and D2) . From 
dispatch on the processor acts as a "normal" C6x except for the handling of changes 
in control flow such as interrupts and branches. 
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ART-UNIT: 273 

PRIMARY-EXAMINER: Kim; Kenneth S. 

ATTY-AGENT-FIRM: Marshall, Jr.; Robert D. Brady, III; W. James Telecky, Jr.; 
Frederick J. 

ABSTRACT : 

This invention is a very long instruction word data processor including plural data 
registers, plural functional units and plural program counters and is selectively 
operable in either a first or second mode. In the first mode, the data processor 
executes a single instruction stream. In the second mode, the data processor 
executes two independent program instruction streams simultaneously. In the second 
mode the data processor may respond to two instruction streams accessing only 
corresponding halves of the data registers and function units. Alternatively, the 
data processor may respond to a first instruction stream including instructions 
referencing the whole data processor employing A side function units by 
alternatively dispatching (1) instructions referencing the A side data registers 
and the A side function units and (2) instructions referencing the B side data 
registers and the B side function units. In the first mode, the data processor 
fetches N bits of instructions each cycle. In the second mode the data processor 
may fetch N bits of instructions for alternate program counters on alternate cycles 
or fetches N/2 bits of each of the first and second program counters. The data 
processor includes interrupt steering and masking control logic allowing 
instructions to control whether the first instruction stream or the second 
instruction stream receives interrupts. 

9 Claims, 4 Drawing figures 
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Abstract Text (1): 

A command buffer for use in packetized DRAM includes a four stage shift register 
for shifting for sequentially storing four 10-bit command words. The shift register 
combines the four 10-bit command words into a single 4 0-bit command word and 
transfer the 40-bit command word to a storage register for processing by the DRAM. 
The shift register may then continue to receive and store subsequent 10-bit command 
words. The command buffer also includes circuitry for determining whether a command 
packet is intended for the memory device containing the command buffer or whether 
it is intended for another memory device. Specifically, a portion of the 40-bit 
command word from the storage register is compared to identifying data stored in an 
identifying latch. In the event of a match, a chip select signal is generated to 
cause the memory device to perform the function corresponding to other portions of 
the 40-bit command word. The identifying latch is programmed with the unique 
identifying data during power-up by storing the identifying data responsive to an 
initialization command packet . The shift register includes shift register circuits 
that are specifically adapted to operate at very high speeds. 

Brief Summary Text (15): 

The column address on bus 70 is applied to a column latch/decoder 100 which, in 
turn, supplies I/O gating signals to an I/O gating circuit 102. The I/O gating 
circuit 102 interfaces with columns of the memory banks 80a-h through sense 
amplifiers 104. Data is coupled to or from the memory banks 80a-h through the sense 
amps 104 and I/O gating circuit 102 to a data path subsystem 108 which includes a 
read data path 110 and a write data path 112. The read data path 110 includes a 
read latch 120 receiving and storing data from the I/O gating circuit 102. In the 
memory device 16a shown in FIG. 2, 64 bits of data are applied to and stored in the 
read latch 120. The read latch then provides four 16-bit data words to a 
multiplexer 122. The multiplexer 122 sequentially applies each of the 16-bit data 
words to a read FIFO buffer 124. Successive 16-bit data words are clocked through 
the FIFO buffer 124 by a clock signal generated from an internal clock by a 
programmable delay circuit 126. The FIFO buffer 124 sequentially applies the 16-bit 
words and two clock signals (a clock signal and a quadrature clock signal) to a 
driver circuit 128 which, in turn, applies the 16-bit data words to a data bus 130 
forming part of the processor bus 14. The driver circuit 128 also applies the clock 
signals to a clock bus 132 so that a device such as the processor 12 reading the 
data on the data bus 130 can be synchronized with the data. 

Brief Summary Text (17): 

As mentioned above, an important goal of the SyncLink architecture is to allow data 
transfer between a processor and a memory device to occur at a significantly faster 
rate. However, the operating rate of a packetized DRAM, including the packetized 
DRAM shown in FIG. 2, is limited by the time required to receive and process 
command packets applied to the memory device 16a. More specifically, not only must 
the command packets be received and stored, but they must also be decoded and used 
to generate a wide variety of signals. However, in order for the memory device 16a 
to operate at a very high speed, the command packets must be applied to the memory 
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device 16a at a correspondingly high speed. As the operating speed of the memory 
device 16a increases, the command packets are provided to the memory device 16a at 
a rate that can exceed the rate at which the command buffer 46 can process the 
command packets . 

Detailed Description Text (10) : 

The remaining components of the command buffer 200 are the decoder 210, the ID 
Register 212, and the storage register 208 and comparator 214 which are shown as 
one block in FIG. 4. These components operate as explained above. However, the 
block diagram of FIG. 4 shows some additional signal inputs and outputs, namely, 
the SI and RESET* inputs and the SO output. All of these signal inputs and outputs 
are used during the initialization sequence. Specifically, at initialization, the 
RESET* input goes active low to load predetermined identification data, i.e., the 
number "63," into the ID register 212. The RESET* signal also clears all 40 stages 
of the storage register 208 so that a spurious command does not appear on the 
command bus 220. By setting the identification data in the ID register 212 to a 
known value, i.e., 63, the processor is able to subsequently load the ID register 
212 with identifying data that is unique to the memory device containing the 
command buffer 200. As mentioned above, the comparator 214 must generate a CHPSEL 
signal to allow the memory device to perform various functions. Included in these 
various functions is decoding the portion of the 40-bit command word that allows 
tne decoder 210 to generate the LOADID signal. Thus, if the processor was not able 
to apply to the command buffer 200 a command packet containing the identifying data 
in the ID register 212, the comparator 214 would not generate the CHPSEL output. 
Without the CHPSEL output, the decoder 210 would not generate the LOADID output to 
load the identifying data into the ID register 212. However, the command packet 
initially contains the binary equivalent of 63 which is favorably compared by the 
comparator 214 to the "63" initial identifying data in the ID register 212. Thus, 
on this initialization command, the comparator 214 generates the CHPSEL signal 
which allows the decoder 210 to generate a LOADID signal that latches other 
portions of the 40-bit command word into the ID register 212 as the unique 
identifying data for the memory circuit containing the command buffer 200. 

Detailed Description Text (29) : 

The inputs to the NOR gate 376 will all be low if either input to each of three NOR 
gates 410, 412, 414 is high. Thus, the inputs to the NOR gate 376 will all be low 
if the Y<3> command bit matches the ID3 bit, the Y<4> command bit matches the ID<4> 
bit, and the Y<5> command bit matches the ID<5> bit. All three inputs to the NOR 
gate 376 will also be low if the Y<0>, Y<1>, Y<6>, Y<2>, Y<3> and Y<4> commands 
bits are all high. Therefore, the CHPSEL signal will be generated if either the 
Y<0:5> command bits match the ID<0:5> identifying bits or if the Y<0:6> command 
bits are all high. The Y<0:6> command bits will all be high whenever the Y<6> 
command bit is high and the Y<0:5> command bits correspond to number 63. As 
mentioned above, at power-up, the identifying data ID<0:5> are set to "63" (binary 
"111111") . Thus, when unique identification data is to be recorded in the ID 
register 212 (FIGS. 3 and 4), the processor generates a command packet in which the 
Y<0:6> bits are all high. As a result, the comparator circuit 214 generates a 
CHPSEL signal which allows the decoder 210 to output a LOADID signal. After the 
unique Y<0:5> bits have been stored in the ID register 212, they are thereafter 
compared with the Y<0:5> command bits and, in the event of a match, the CHPSEL 
signal is generated to allow the memory device containing the command buffer 200 to 
perform a function corresponding to other bits of the command word. 

Current US Original Classification (1) : 
711/154 
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Prince, Betty, "High Performance Memories," John Wiley & Sons Ltd., West Sussex, 
England, 1996, pp. 143-146. 

ART-UNIT : 2186 

PRIMARY -EXAMINER : Kim; Matthew 
ASSISTANT- EXAMINER: Anderson; Matthew D. 
ATTY-AGENT-FIRM: Dorsey & Whitney LLP 

ABSTRACT: 

A command buffer for use in packetized DRAM includes a four stage shift register 
for shifting for sequentially storing four 10-bit command words. The shift register 
combines the four 10-bit command words into a single 4 0-bit command word and 
transfer the 40-bit command word to a storage register for processing by the DRAM. 
The shift register may then continue to receive and store subsequent 10-bit command 
words. The command buffer also includes circuitry for determining whether a command 
packet is intended for the memory device containing the command buffer or whether 
it is intended for another memory device. Specifically, a portion of the 40-bit 
command word from the storage register is compared to identifying data stored in an 
identifying latch. In the event of a match, a chip select signal is generated to 
cause the memory device to perform the function corresponding to other portions of 
the 40-bit command word. The identifying latch is programmed with the unique 
identifying data during power-up by storing the identifying data responsive to an 
initialization command packet . The shift register includes shift register circuits 
that are specifically adapted to operate at very high speeds. 

18 Claims, 13 Drawing figures 
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Prince, Betty, "High Performance Memories," John Wiley & Sons Ltd., West Sussex, 
England, 1996, pp. 143-146. 

ART-UNIT: 2186 

PRIMARY-EXAMINER: Kim; Matthew 
ASSISTANT-EXAMINER: Anderson; Matthew D. 
ATTY-AGENT-FIRM: Dorsey & Whitney LLP 

ABSTRACT: 

A command buffer for use in packetized DRAM includes a four stage shift register 
for shifting for sequentially storing four 10-bit command words. The shift register 
combines the four 10-bit command words into a single 40-bit command word and 
transfer the 40-bit command word to a storage register for processing by the DRAM. 
The shift register may then continue to receive and store subsequent 10-bit command 
words. The command buffer also includes circuitry for determining whether a command 
packet is intended for the memory device containing the command buffer or whether 
it is intended for another memory device. Specifically, a portion of the 40-bit 
command word from the storage register is compared to identifying data stored in an 
identifying latch. In the event of a match, a chip select signal is generated to 
cause the memory device to perform the function corresponding to other portions of 
the 40-bit command word. The identifying latch is programmed with the unique 
identifying data during power-up by storing the identifying data responsive to an 
initialization command packet . The shift register includes shift register circuits 
that are specifically adapted to operate at very high speeds. 

18 Claims, 13 Drawing figures 
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DOCUMENT-IDENTIFIER: US 5313582 A 

TITLE: Method and apparatus for buffering data within stations of a communication 
network 



Abstract Text (1) : 

Method and apparatus are disclosed for buffering data packets in a data 
communication controller. The communication controller is interfaced with a host 
processor and includes a control unit for accessing a communication medium. Each 
data packet to be transmitted or received is assigned a packet number. Packet 
number assignment is carried out by a memory management unit within the 
communication controller which dynamically allocates to each assigned packet number 
one or more pages in a data packet buffer memory for the storage of the 
corresponding data packet . Upon issuing the assigned packet number, the physical 
addresses of the allocated pages of data packet buffer memory storage space are 
generated in a manner transparent to both the host processor and the control unit. 
Upon completion of each data packet loading operation, the corresponding packet 
number is stored in a packet number queue maintained for subsequent retrieval in 
order to generate the physical addresses at which the corresponding data packet has 
been stored. Also disclosed is a mechanism for automatically generating transmit 
interrupts to the host processor upon the completion of any preselected number of 
data packet transmissions determined by the host processor. 

Brief Summary Text (34): 

Preferably, the data storage locations and packet numbers are dynamically 
allocated. In such an embodiment, the number of data page storage locations 
required to store each data packet is first determined. The required number of free 
data page storage locations are then allocated for storing the data packet. 
Thereafter, the unique packet number is assigned to each allocated data page 
storage location. In addition to dynamic packet number and page storage location 
allocation, linear-to-physical address conversion is employed to create memory 
access windows in the data packet storage means. In this way, writing into or 
reading from the data packet storage means appears as if accessing a fixed memory 
storage space, when in fact, the actual physical storage locations being accessed 
are situated elsewhere in buffer memory, unbeknownst to both the processor and the 
medium access control unit. 

Detailed Description Text (7) : 

While packet numbers corresponding to transmit data packets D.sub.Ti can be 
assigned by the host processor (and by the medium access control unit for receive 
data packets D.sub.Rj), the memory management unit of the illustrated embodiment 
assigns all packet numbers. To achieve this function, the memory management unit 
further includes a packet number assignment unit 17 which accepts and decodes 
memory storage requests R.sub.Ti and R.sub.Rj from the host processor and medium 
access control unit, respectively. In response, the packet number assignment unit 
assigns a packet number to each corresponding data packet, for which a memory 
storage request has been made. In turn, packet numbers N.sub.Ti and N.sub.Rj are 
issued to the host processor and medium access control unit, respectively. In the 
illustrated embodiment, each packet number will be a unique number, digitally 
represented within the communication controller and host processor . 
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Detailed Description Text (11) : 

The medium access control unit first reads out the i-th data packet number N.sub.Ti 
from the removal location of transmit packet number queue 4. This retrieved packet 
number is then provided to address conversion unit 8 which generates physical 
addresses A.sub.Ti ' that specify the physical storage locations in which the data 
bytes of the corresponding data packet D.sub.Ti are stored. From these accessed 
storage locations in buffer memory, the data bytes comprising the transmit data 
packet D.sub.Ti are read out by the medium access control unit and subsequently 
placed on the communication medium 103. After transmission of data packet D.sub.Ti, 
the medium access control unit can write transmit status data into storage 
locations associated with the physical storage locations from which data packet 
D.sub.Ti was read out. After transmit status data is written into buffer memory 6, 
an interrupt to the host processor is generated. The host processor, maintaining a 
software queue of assigned packet numbers N.sub.Ti, gains access to transmit status 
data in these physical storage locations, by selecting packet number N.sub.Ti from 
the removal location in the software queue. The selected packet number N.sub.Ti is 
then converted into the allocated physical addresses A.sub.Ti 1 by the address 
conversion unit. After transmit status data is read and utilized by the host 
processor, the host processor issues a release command F (N.sub.Ti) to packet number 
assignment unit 7, in order to release the storage locations in buffer memory 6 
that have been allocated to packet number N.sub.Ti. In this way, these released 
storage locations will be free for future allocation to either transmit or receive 
data packets. 

Detailed Description Text (12) : 

The communication controller of the present invention operates much in the same way 
for loading and unloading of receive data packets D.sub.Rj. For example, the medium 
access control unit issues a request R.sub.Rj to the memory management unit for 
allocation of a number of storage locations in buffer memory 6, sufficient to store 
the j-th incoming receive data packet, D.sub.Rj. In response to request R.sub.Rj, a 
packet number N.sub.Rj is assigned to the j-th receive data packet, D.sub.Rj and 
then provided to the medium access control unit. The medium access control unit 
issues assigned packet number N.sub.Rj to the address conversion unit of the memory 
management unit, which generates a set of physical addresses A.sub.Rj that specify 
and provide access to storage locations in buffer memory 6, for storing receive 
data packet D.sub.Rj which corresponds to packet number N.sub.Rj. With access to 
allocated storage locations within buffer memory 6, the j-th receive data packet 
D.sub.Rj is read from the medium access control unit into the allocated storage 
locations. After loading receive data packet D.sub.Rj into buffer memory, 
corresponding packet number N.sub.Rj is placed into the insertion location of 
receive packet number queue 10 of the communication controller. Receive status data 
concerning the receive data packet D.sub.Rj, can be written into one or more of 
those storage locations which have been allocated to receive packet number 
N.sub.Rj. The nature of such receive status data can relate to the integrity or the 
type of data packet just reviewed. Subsequently, an interrupt to the host processor 
will be generated indicating that unloading of a receive data packet can take place 
when desired by the host processor. Prior to the packet unloading operation, 
however, the host processor can read receive status data stored in buffer memory, 
in a manner similar to transmit status data storage and retrieval discussed above. 

Detailed Description Text (18) : 

In the first illustrated embodiment of the present invention, the packet length can 
vary between 64 to 1518 bytes per data packet and buffer memory 6 (e.g. RAM) is 
preferably divided into eighteen pages, each of which contains 256 bytes of storage 
locations. In such an embodiment, each storage location is of sufficient bit length 
to store a byte of data. In order to maintain physical storage locations 
transparent to the host processor and the medium access control unit while 
permitting data byte transfer, a windowing-type memory accessing technique is 
employed. In essence, this technique involves the host processor and medium access 
control unit either writing a packet of data bytes into or reading a packet of data 
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bytes from an apparently fixed, linearly addressed window of storage locations in 
buffer memory 6. In actuality, however, such data bytes are written into or read 
from other physical storage locations that have been preallocated to the 
corresponding packet number by the memory management unit of the communication 
controller. In order to clearly illustrate this feature of the present invention, 
reference is made to FIGS . 5 and 5A, in particular. 

Detailed Description Text (21) : 

Referring now to FIGS. 6 and 6A, the process of accessing buffer memory 6 through 
the memory access window W. sub. CPU, is illustrated in connection with storing 
transmit data packets into one or more pages of the buffer memory. As illustrated 
in FIG. 6 and 6A, four transmit packets D. sub. TO through D.sub.T3 are queued up in 
the transmit data packet queue 4 ready to be stored in buffer memory 6. One data 
packet at a time is assigned a unique packet number N.sub.ti and then given a set 
of linear addresses, represented as A.sub.Ti for transmit data packet D.sub.Ti. As 
understood, the length of each data packet D.sub.Ti is proportional to the number 
of bytes contained in the data packet, and the more bytes contained within the data 
packet implies that a greater range of linear addresses A.sub.Ti will be required 
to write data packet D.sub.Ti into data packet window W.sub.cpu, defined above. For 
illustrative purposes only, transmit data packet D. sub. TO contains six pages of 
data bytes. Thus, the range of linear addresses which need to be generated to write 
this memory access into data packet window W.sub.cpu, begins at {000 00000000} and 
terminates at about {101 11111111} as shown. On the other hand, data packets 
D.sub.T3 and D.sub.T4 each contain about two and one-half pages of data bytes, and 
thus the range of linear addresses which need to be generated to write each data 
packet into memory access window W.sub.cpu, begins at {000 00000000} and terminates 
at about {010 00111111} as shown. 

Detailed Description Text (24): 

Two important points should be made at this juncture regarding the present 
invention. First, the linear-to-physical address conversion process within the 
memory management unit is completely transparent to both the host processor and the 
medium access control unit. Consequently, neither the host processor or the medium 
access control unit know just where any data packet may be stored in buffer memory; 
all that the host processor and medium access control unit have is a packet number 
assigned to a corresponding data packet stored somewhere in buffer memory 6. 
Secondly, the host processor and the medium access control unit are each capable of 

(i) accessing the packet numbers from the transmit and receive packet number queues 
9 and 10, and (ii) writing into and reading from fixed memory access windows 

(W.sub.cpu and W. sub. mac) defined by a delimited range of linear addresses. 
Consequently, buffer memory 6 is seen by the host processor and medium access 
control unit as a set of independent memory areas consisting of contiguous byte 
storage locations, having a length equal to the memory access windows W.sub.cpu and 
W. sub. mac, e.g. 2 kilobytes. 

Detailed Description Text (26) : 

In FIG. 7, CPU interface unit 30 generally comprises logic circuitry suitable for 
interfacing the address, data and control lines of system bus 37 with buffer memory 
34, memory management unit 35, and transmit and receive packet number queues 32 and 
33, involving multiplexers M.sub.2, M.sub.3, M.sub.4, and M.sub.6, as shown. While 
not shown, CPU interface unit 30 also includes a transmit interrupt storage 
register, a receive interrupt storage register, an MMU interrupt storage register 
and an interrupt generating circuit. Each storage register is adopted to store an 
interrupt code. The output of each storage register is read by the interrupt 
generating circuit and depending on the content of what is read, it generates a 
respective interrupt which is provided to the host processor over a designated line 
38. As will be described in greater detail hereinafter, interrupt codes for 
transmit and receive interrupt storage registers are provided from transmit and 
receive packet number queues 32 and 33, respectively. The interrupt code for the 
MMU interrupt storage register is provided by memory management unit 35. The 
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interrupt generating circuit is adapted to generate each respective interrupt to 
the host processor under a particular condition. The first condition is for 
generating a receive interrupt and occurs after the medium access control unit has 
written receive status bytes into buffer memory 34 after a data packet reception. 
The second condition is for generating transmit interrupt and occurs after medium 
access control unit has transmitted one or more transmit data packets, and it might 
be time to store more transmit data packets in buffer memory. The third condition 
is for generating an MMU interrupt and occurs after a requested free page becomes 
available in buffer memory 6 and a packet number is assigned to a transmit data 
packet. When the low-level driver executed by the host processor, receives an 
interrupt, it will instruct the host processor to determine the source of the 
interrupt. The interrupt generating mechanism will be described in yet greater 
detail hereinafter. 

Detailed Description Text (48) : 

When a memory storage request R.sub.Rj from the medium access storage unit appears 
at packet number assignment unit 62 as shown, a sequence of operations occur. 
First, a "page request" signal is generated and transmitted to page allocation and 
management unit 61. In response to the page request signal, the second row of the 
table of FIG. 7C is searched to determine if there is a free page available for 
allocation to an available packet number. As in the case of transmit packet 
requests, if there is a free page available at the time of the page request signal, 
then the available packet number is assigned to the incoming data packet D.sub.Rj 
and then it is written into the third row of the table of FIG. 7C, below the free 
page. Then, page allocation and management unit 61 transmits to packet number 
assignment unit 62, a "page request granted" signal, and upon receipt thereof, the 
packet number assignment unit 62 counts a first page as having been allocated to 
the assigned packet number. At this stage of the process, the memory management 
unit does not know if a single page of buffer memory is sufficient to store the 
incoming data packet. However, presuming that a single page of buffer might be 
sufficient, packet number assignment unit 61 places the assigned packet number 
N.sub.Rj onto data line 53B. The assigned packet number is transmitted to MAC 
interface unit 31 and stored in the packet number storage register (not shown) . The 
medium access control unit reads this register and uses the packet number to load 
the first page of data bytes into the first allocated page in buffer memory 34. The 
process by which packet loading occurs involves linear-to-physical address 
conversion using the packet number and address on conversion unit 60. Notably, only 
linear addresses corresponding to a first page (e.g. 256 data Bytes) are generated 
and provided to the address conversion unit in order to write in the first data 
page of the incoming data packet. The details of this address conversion process 
will be described in detail hereinafter. 

Detailed Description Text (49) : 

If another page buffer memory is required to store the complete incoming data 
packet, then the medium access control unit presents to the packet number 
assignment unit 62, a second request for an additional page of memory to be 
allocated to the originally assigned packet number. The above-described page 
allocation process is carried out by again transmitting a page request signal to 
page allocation and management unit 61. If an additional free page is found after 
searching the table of FIG. 7C, then this page is allocated to the originally 
assigned packet number, and the data in the table of FIG. 7C is used to update page 
allocation information contained in address conversion unit 60. Then, a page 
request granted signal is transmitted to packet number assignment unit 62. In 
response, the assigned packet number is again placed onto lines 53B and appears in 
the packet number storage register in the MAC interface unit. This prompts the 
medium access control unit to write the second page of the incoming data packet 
into buffer memory 34. This is achieved by simply providing to the address 
conversion unit, the assigned packet number and a set of linear addresses 
corresponding to the second page of data bytes within the incoming data packet. For 
each additional page needed to store the incoming data packet, the above process of 
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additional page requisition, page allocation, address conversion information 
updating, linear address generation and packet number presentment, is performed. 
When sufficient pages have been allocated and the entire incoming data packet is 
received and stored into buffer memory 34, the data packet and page allocation 
table of FIG. 7C will be complete, and using this table, the information regarding 
address conversion will have also been completely updated. At this stage, the 
medium access control unit will write the assigned packet number N.sub.Rj into the 
insert storage location of second FIFO storage unit 33. In this way, when the 
assigned packet number N.sub.Rj is read out from FIFO storage unit 33 by the host 
processor during data packet unloading operations, this packet number and its 
complete range of linear addresses will simply ensure access to the corresponding 
data packet, wherever it may be physically stored in buffer memory. Thereafter, a 
receive interrupt to the host processor is generated automatically as described 
hereinabove, in order to notify the host processor that a receive data packet is 
stored in memory buffer 34 and is ready for unloading. 

Detailed Description Text (55) : 

As illustrated in FIG. 6, 6A and 7A, each linear address has a first linear address 
component A.sub.l and a second linear address component A. sub.. As discussed 
hereinabove, the first linear component within the memory management unit, 
represents the page of a particular data byte within the data packet, whereas 
second linear address component represents the location of the particular data byte 
within the specified page. Using this relationship, Address Conversion Unit 60 is 
reduced to generating a physical address A' which also has two components: the 
first physical address component A.sub.l 1 being the physical page location 
C.sub.K, and the second physical address component A. sub. 2 ' being the physical 
location of each byte within the physical page location C.sub.K. This process is 
achieved for transmit data packets (i) by using the packet number and the address 
conversion table of FIG. 7B, to convert "on the fly" the three-bit linear address 
component A.sub.l into the five bit physical address component C.sub.K, to which 
linear address component A.sub.l has been allocated; and (ii) by simultaneously 
passing the eight-bit linear address component A. sub. 2 to the output, to provide 
the eight bit physical address component A. sub. 40 .sub. 2. The result is a thirteen- 
bit physical address A.sub.Ti ? ={C.sub.K, A. sub. 2 } where C.sub.K represents the 
five most significant bits and A. sub. 2 represents the eight least significant bits. 
The thirteen-bit physical address A.sub.Ti 1 is then provided to the address port 
of buffer memory 34, to facilitate reading out and writing in data bytes of the 
corresponding data packet. The above address conversion process is performed in the 
same manner for each receive data packet D.sub.Rj, in which each linear address 
A.sub.Rj ={A.sub.l, A. sub. 2 } is converted into physical address A.sub.Rj ' = 
(C.sub.K, A. sub. 2 }, defined above. 

Detailed Description Text (59) : 

As illustrated in FIGS. 9 and 9A, address conversion in the second embodiment is 
different in one important respect. That is, each data packet written into or read 
from a memory access window W.sub.cpu or W. sub. Mac, is about to be or has been 
stored within a single page of buffer memory, which in the exemplary embodiment, 
has a length of 2 kilobytes. The details of this address conversion process will be 
described hereinafter in connection with the communication controller of the second 
embodiment . 

Detailed Description Text (69) : 

As illustrated in FIG. 10A, each linear address A.sub.Ti (and A.sub.Rj) has eleven- 
bits which represent within the memory management unit of the second embodiment, 
the physical location of the particular data byte within the data packets. The page 
within the buffer memory is not specified by these linear addresses. For transmit 
data packets, the address conversion process is achieved (i) by using the packet 
number and the address conversion table of FIG. 10B (or function C.sub.K =N.sub.Ti, 
to convert "on the fly" the packet number N.sub.Ti into the five-bit physical 
address component C.sub.K, to which the packet number N.sub.Ti has been pre- 
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allocated; and (ii) by simultaneously passing to the output line 56, the eleven-bit 
linear address component A.sub.Ti which is representative of the eleven bit address 
component A. sub. 2. Each resulting sixteen-bit physical address A.sub.Ti f ={C.sub.K, 
A.sub.Ti } is then provided to the address port of buffer memory 34 to facilitate 
reading out or writing in the bytes of the corresponding data packet. The above 
address conversion process is performed in the same manner for receive data packets 
D.sub.Rj, in which each eleven bit linear address A.sub.Rj is converted into a 
sixteen-bit physical address A.sub.Rj ={C.sub.K, A.sub.Rj }. 

Detailed Description Text (91): 

In FIG. 12B, the operations involved in unloading a transmit data packet from the 
buffer memory, are illustrated. When the medium access control until is not engaged 
in its data packet reception mode, it can unload transmit data packet from buffer 
memory 34 provided there is a packet number in removal storage location of the 
first FIFO storage unit (i.e., transmit packet number queue) 32. To determine if 
this condition exists, the "empty" signal transmitted from FIFO storage unit 32 to 
MAC interface unit 31 over line 45, is checked to determine its value. If it is 
logical "0", then there is at least one packet number N.sub.Ti in FIFO storage unit 
32; otherwise, if it is logical "1", then the FIFO storage unit is empty and no 
transmit packet unloading is possible. In response, the medium access control unit 
first reads a packet number from the removal location FIFO storage unit 32. This 
packet number is then transmitted to the packet number input port of the memory 
management unit, to generate physical addresses corresponding to stored data 
packet. Then the byte count in the third and fourth byte storage locations in the 
memory access window W. sub. mac are read out by the medium access location control 
unit and stored. Notably, this byte count indicates what the linear length of the 
linear addresses must be in order to read out the data bytes of the stored packet. 
The data bytes of the stored packet are then read (i.e., copied) from the memory 
access window W. sub. mac by generating linear addresses indicated by the byte count. 
These unloaded data bytes are transmitted over the communication medium in the 
normal course. If the data transmission is successful, then the medium access 
control unit may simply release the pages allocated to the unloaded transmitted 
data packet. Alternatively, it can write transmit "pass" status bytes into first 
and second byte locations of window W. sub. mac, to the host processor. 
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